UniFI logo

SKKU AORC 2017

abstract
A suite of tools to interact with the OEIS

In [2]:
__AUTHOR__ = ('Massimo Nocentini',
              'massimo.nocentini@unifi.it',
              'https://github.com/massimo-nocentini/')

__ADVISOR__ = ('prof. Donatella Merlini',
               'donatella.merlini@unifi.it')

__SELF__ = github_io + '/PhD/skku-aorc-2017/oeistools.html#/'

toc

  • suite of console programs
    • crawler
    • pretty printer
    • grapher
  • API and Jupyter notebook interfaces

crawler

  • a console program
  • fetch sequences recursively, according to their cross refs
  • state of the art impl respect to asynchronous
    • requires Python 3.6, namely the latest release
    • no threads, race conditions, data sync
    • lies on async/await Python primitives only
    • 250 lines of code
  • cache a portion of the OEIS to speed up repeated lookups
  • restart from the cache you have already fetched
  • support configuration of degree of parallelism
    • too fast, dude! don't let the OEIS ban you ;)

help & usage

the following is the companion description, in Unix style:

In [3]:
!python3.6 ../../src/crawling.py --help
usage: crawling.py [-h] [--clear-cache] [--restart] [--workers WORKERS]
                   [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                   [--cache-dir CACHE_DIR] [--progress-mark PROGRESS_MARK]
                   [S [S ...]]

OEIS Crawler.

positional arguments:
  S                     Sequence to fetch, given in the form Axxxxxx

optional arguments:
  -h, --help            show this help message and exit
  --clear-cache         Clear cache of sequences, according to --cache-dir
  --restart             Build fringe from cached sequences (defaults to False)
  --workers WORKERS     Degree of parallelism (defaults to 10)
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Logger verbosity (defaults to ERROR)
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --progress-mark PROGRESS_MARK
                        Symbol for fetched event (defaults to ●)

fetching

commando to download two sequences, the Fibonacci and Catalan numbers respectively:

In [9]:
!python3.6 ../../src/crawling.py A000045 A000108 --cache-dir=../../src/fetched/
●●●●●●●●●●●^C

fetched 12 new sequences:
{'A000108', 'A002420', 'A003046', 'A000081', 'A167893', 'A033184', 'A032443', 'A038003', 'A001791', 'A120303', 'A000045', 'A014138'}

restarting

if the cache contains sequences, we restart from the set of their cross refs, recursively:

In [10]:
!python3.6 ../../src/crawling.py --restart --cache-dir=../../src/fetched/
●●●●●●●●●●●●●●●●●^C

fetched 18 new sequences:
{'A003519', 'A091867', 'A036355', 'A068231', 'A047072', 'A038575', 'A051575', 'A105317', 'A071679', 'A099731', 'A000744', 'A059365', 'A104597', 'A137697', 'A120274', 'A000169', 'A001699', 'A100257'}

cache summary

the following shows cache status and its content:

In [11]:
!python3.6 ../../src/crawling.py --cache-dir=../../src/fetched/
30 sequences in cache ../../src/fetched/
260 sequences in fringe for restarting
In [17]:
!ls ../../src/fetched/
A000045.json A001699.json A014138.json A038575.json A071679.json A105317.json
A000081.json A001791.json A032443.json A047072.json A091867.json A120274.json
A000108.json A002420.json A033184.json A051575.json A099731.json A120303.json
A000169.json A003046.json A036355.json A059365.json A100257.json A137697.json
A000744.json A003519.json A038003.json A068231.json A104597.json A167893.json

pprinter

  • a proxy for searching in the OEIS
  • tabular representation of data sections
    • one and two dim matrix notation
  • filters application on most result's sections
  • takes advantage of cached sequences
  • both console and notebook interfaces

help & usage

the following is the companion description, in Unix style:

In [18]:
!python3.6 ../../src/pprinting.py --help
usage: pprinting.py [-h]
                    (--id ID | --seq SEQ | --query QUERY | --most-recents M)
                    [--force-fetch] [--cache-dir CACHE_DIR] [--tables-only]
                    [--start-index S] [--max-results R] [--data-only]
                    [--upper-limit U] [--comment-filter C]
                    [--formula-filter F] [--xrefs-filter X] [--link-filter L]
                    [--cite-filter R]

OEIS Pretty Printer.

optional arguments:
  -h, --help            show this help message and exit
  --id ID               Sequence id, given in the form Axxxxxx
  --seq SEQ             Literal sequence, ordered '[...]' or presence '{...}'
  --query QUERY         Open query for plain search, in the form '...'
  --most-recents M      Print the most recent sequences ranking by M in ACCESS
                        or MODIFY, looking into --cache-dir, at most --max-
                        results (defaults to None)
  --force-fetch         Bypass cache fetching again, according to --cache-dir
                        (defaults to False)
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --tables-only         Print matrix sequences only (defaults to False)
  --start-index S       Start from result at rank position S (defaults to 0)
  --max-results R       Pretty print the former R <= 10 results (defaults to
                        10)
  --data-only           Show only data repr and preamble (defaults to False)
  --upper-limit U       Upper limit for data repr: U is a dict '{"list":i,
                        "table":(r, c)}' where i, r and c are ints (defaults
                        to i=15, r=10 and c=10), respectively)
  --comment-filter C    Apply filter C to comments, where C is Python `lambda`
                        predicate 'lambda i,c: ...' referring to i-th comment
                        c
  --formula-filter F    Apply filter F to formulae, where F is Python `lambda`
                        predicate 'lambda i,f: ...' referring to i-th formula
                        f
  --xrefs-filter X      Apply filter X to cross refs, where X is Python
                        `lambda` predicate 'lambda i,x: ...' referring to i-th
                        xref x
  --link-filter L       Apply filter L to links, where L is Python `lambda`
                        predicate 'lambda i,l: ...' referring to i-th link l
  --cite-filter R       Apply filter R to citation, where R is Python `lambda`
                        predicate 'lambda i,r: ...' referring to i-th citation
                        r

pprint using cache

the following command pretty prints Fibonacci numbers content with some filters

In [29]:
!python3.6 ../../src/pprinting.py \
    --id A000045 \
    --cache-dir=../../src/fetched/ \
    --comment-filter 'lambda i,c: "Barry" in c' \
    --formula-filter 'lambda i,f: i < 5'
 Fibonacci numbers: F(n) = F(n-1) + F(n-2) with F(0) = 0 and F(1) = 1.

by _N. J. A. Sloane_, 1964

_Keywords_: `nonn,core,nice,easy,hear,changed`

_Data_:
[0  1  1  2  3  5  8  13  21  34  55  89  144  233  377]

_Comments_:
    ● Fib(n+2) = Sum_{k=0..n} binomial(floor((n+k)/2),k), row sums of A046854. -
      _Paul Barry_, Mar 11 2003

_Formulae_:
    ● G.f.: x / (1 - x - x^2).
    ● G.f.: Sum_{n>=0} x^n * Product_{k=1..n} (k + x)/(1 + k*x). - _Paul D.
      Hanna_, Oct 26 2013
    ● F(n) = ((1+sqrt(5))^n-(1-sqrt(5))^n)/(2^n*sqrt(5)).
    ● Alternatively, F(n) = ((1/2+sqrt(5)/2)^n-(1/2-sqrt(5)/2)^n)/sqrt(5).
    ● F(n) = F(n-1) + F(n-2) = -(-1)^n F(-n).

_Cross references_:
    ● Cf. A039834 (signed Fibonacci numbers), A001690 (complement), A000213,
      A000288, A000322, A000383, A060455, A030186, A020695, A020701, A071679,
      A099731, A100492, A094216, A094638, A000108, A101399, A101400, A001611,
      A000071, A157725, A001911, A157726, A006327, A157727, A157728, A157729,
      A167616, A059929, A144152, A152063, A114690, A003893, A000032, A060441,
      A000930, A003269, A000957, A057078, A007317, A091867, A104597, A249548,
      A262342, A001060, A022095.
    ● First row of arrays A103323, A234357. Second row of arrays A099390,
      A048887, and A092921 (k-generalized Fibonacci numbers).
    ● a(n) = A094718(4, n). a(n) = A101220(0, j, n).
    ● a(n) = A090888(0, n+1) = A118654(0, n+1) = A118654(1, n-1) = A109754(0, n)
      = A109754(1, n-1), for n > 0.
    ● Fibonacci-Pascal triangles: A027926, A036355, A037027, A074829, A105809,
      A109906, A111006, A114197, A162741, A228074.
    ● Boustrophedon transforms: A000738, A000744.
    ● Powers: A103323, A105317, A254719.
    ● Numbers of prime factors: A022307 and A038575.
    ● Cf. A163733.

pprint most recent

the following command pretty prints the first 3 results from fetched sequences, according to the rank by most recent access time, reporting data only

In [36]:
!python3.6 ../../src/pprinting.py \
    --cache-dir=../../src/fetched/ \
    --most-recent ACCESS \
    --data-only \
    --max-results 3 \
    --upper-limit '{"list":10}'
 Fibonacci numbers: F(n) = F(n-1) + F(n-2) with F(0) = 0 and F(1) = 1.

by _N. J. A. Sloane_, 1964

_Keywords_: `nonn,core,nice,easy,hear,changed`

_Data_:
[0  1  1  2  3  5  8  13  21  34]

________________________________________________________________________________

 Number of unlabeled rooted trees with n nodes (or connected functions with a
  fixed point).

by _N. J. A. Sloane_

_Keywords_: `nonn,easy,core,nice,eigen`

_Data_:
[0  1  1  2  4  9  20  48  115  286]

________________________________________________________________________________

 Catalan numbers: C(n) = binomial(2n,n)/(n+1) = (2n)!/(n!(n+1)!). Also called
  Segner numbers.

by _N. J. A. Sloane_

_Keywords_: `core,nonn,easy,eigen,nice`

_Data_:
[1  1  2  5  14  42  132  429  1430  4862]

tables datas

the following command pretty prints only two dims sequences

In [4]:
!python3.6 ../../src/pprinting.py \
    --query 'pascal triangle' \
    --cache-dir=../../src/fetched/ \
    --tables-only \
    --data-only \
    --max-results 2
 Pascal's triangle read by rows: C(n,k) = binomial(n,k) = n!/(k!*(n-k)!), 0 <= k
  <= n.

by _N. J. A. Sloane_ and _Mira Bernstein_, Apr 28 1994

_Keywords_: `nonn,tabl,nice,easy,core,look,hear`

_Data_:
⎡1  0  0   0    0    0   0   0   0  0⎤
⎢                                    ⎥
⎢1  1  0   0    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  2  1   0    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  3  3   1    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  4  6   4    1    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  5  10  10   5    1   0   0   0  0⎥
⎢                                    ⎥
⎢1  6  15  20  15    6   1   0   0  0⎥
⎢                                    ⎥
⎢1  7  21  35  35   21   7   1   0  0⎥
⎢                                    ⎥
⎢1  8  28  56  70   56   28  8   1  0⎥
⎢                                    ⎥
⎣1  9  36  84  126  126  84  36  9  1⎦

________________________________________________________________________________

 Sierpiński's [Sierpinski's] triangle (or gasket): triangle, read by rows,
  formed by reading Pascal's triangle mod 2.

by _N. J. A. Sloane_

_Keywords_: `nonn,tabl,easy,nice`

_Data_:
⎡1  0  0  0  0  0  0  0  0  0⎤
⎢                            ⎥
⎢1  1  0  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  0  1  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  0  0  0  1  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  0  0  1  1  0  0  0  0⎥
⎢                            ⎥
⎢1  0  1  0  1  0  1  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  1  1  0  0⎥
⎢                            ⎥
⎢1  0  0  0  0  0  0  0  1  0⎥
⎢                            ⎥
⎣1  1  0  0  0  0  0  0  1  1⎦

notebook interface

also, showing under the hood API

In [5]:
import pprinting
In [6]:
result = pprinting.search(id='A000045',
                          cache_info={'cache_dir':'../../src/fetched'})
In [7]:
result(comment=lambda i,c: i < 4,
       formula=lambda i,f: i < 4)
Out[7]:

Results for query: https://oeis.org/search?fmt=json&start=0&q=id%3AA000045


A000045: Fibonacci numbers: F(n) = F(n-1) + F(n-2) with F(0) = 0 and F(1) = 1.

by N. J. A. Sloane, 1964

Keywords: nonn,core,nice,easy,hear,changed

Data:

$$ \begin{array}{c|ccccccccccccccc } n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 \\ \hline A000045(n) & 0 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 & 144 & 233 & 377 \end{array} $$

Comments:

  • Also sometimes called Lamé's sequence.
  • F(n+2) = number of binary sequences of length n that have no consecutive 0's.
  • F(n+2) = number of subsets of {1,2,...,n} that contain no consecutive integers.
  • F(n+1) = number of tilings of a 2 X n rectangle by 2 X 1 dominoes.

Formulae:

  • G.f.: x / (1 - x - x^2).
  • G.f.: Sum{n>=0} x^n * Product{k=1..n} (k + x)/(1 + k*x). - Paul D. Hanna, Oct 26 2013
  • F(n) = ((1+sqrt(5))^n-(1-sqrt(5))^n)/(2^n*sqrt(5)).
  • Alternatively, F(n) = ((1/2+sqrt(5)/2)^n-(1/2-sqrt(5)/2)^n)/sqrt(5).

Cross references:

grapher

  • represent connections according to cross refs
  • works on your cached sequences
  • draws both digraphs and undirected graphs
  • builds on top of networkx module, layout selection
    • the API allows filtering of both vertices and edges

help & usage

In [8]:
!python3.6 ../../src/graphing.py --help
usage: graphing.py [-h] [--directed] [--cache-dir CACHE_DIR]
                   [--graphs-dir GRAPHS_DIR] [--dpi DPI] [--layout LAYOUT]
                   F

OEIS grapher.

positional arguments:
  F                     Save image in file F.

optional arguments:
  -h, --help            show this help message and exit
  --directed            Draw directed edges
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --graphs-dir GRAPHS_DIR
                        Graphs directory (defaults to ./graphs/)
  --dpi DPI             Resolution in DPI (defaults to 600)
  --layout LAYOUT       Graph layout, choose from: {RANDOM, CIRCULAR, SHELL,
                        FRUCHTERMAN-REINGOLD, SPRING, SPECTRAL} (defaults to
                        SHELL)
In [14]:
!python3.6 ../../src/graphing.py \
    --cache-dir=../../src/fetched/ \
    --graphs-dir=./ \
    --layout SPRING \
    graph.png 
In [15]:
from IPython.display import Image
Image("./graph.png")
Out[15]:

고맙습니다

In [16]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!